Sublinear Time Approximation of Text Similarity Matrices

نویسندگان

چکیده

We study algorithms for approximating pairwise similarity matrices that arise in natural language processing. Generally, computing a matrix n data points requires Omega(n^2) computations. This quadratic scaling is significant bottleneck, especially when similarities are computed via expensive functions, e.g., transformer models. Approximation methods reduce this complexity, often by using small subset of exactly to approximate the remainder complete matrix. Significant work focuses on efficient approximation positive semidefinite (PSD) matrices, which kernel methods. However, much less understood about indefinite (non-PSD) NLP. Motivated observation many these still somewhat close PSD, we introduce generalization popular Nystrom method setting. Our algorithm can be applied any and runs sublinear time size matrix, producing rank-s with just O(ns) show our method, along simple variant CUR decomposition, performs very well variety arising NLP tasks. demonstrate high accuracy approximated tasks document classification, sentence similarity, cross-document coreference.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text Indexing and Searching in Sublinear Time

We introduce the first index that can be built in o(n) time for a text of length n, and also queried in o(m) time for a pattern of length m. On a constant-size alphabet, for example, our index uses O(n log n) bits, is built in O(n/ log n) deterministic time, and finds the occ pattern occurrences in time O(m/ logn + √ logn log logn + occ), where ε > 0 is an arbitrarily small constant. As a compa...

متن کامل

Sublinear Approximation of Signals

It has recently been observed that sparse and compressible signals can be sketched using very few nonadaptive linear measurements in comparison with the length of the signal. This sketch can be viewed as an embedding of an entire class of compressible signals into a low-dimensional space. In particular, d-dimensional signals with m nonzero entries (m-sparse signals) can be embedded in O(m log d...

متن کامل

Sublinear Graph Approximation Algorithms

Motivation Want to learn a combinatorial parameter of a graph: the maximum matching size the independence number α(G), the minimum vertex cover size, the minimum dominating set size Krzysztof Onak – Sublinear Graph Approximation Algorithms – p. 2/32 Motivation Want to learn a combinatorial parameter of a graph: the maximum matching size the independence number α(G), the minimum vertex cover siz...

متن کامل

Improved Approximation Guarantees for Sublinear-Time Fourier Algorithms

In this paper modified variants of the sparse Fourier transform algorithms from [32] are presented which improve on the approximation error bounds of the original algorithms. In addition, simple methods for extending the improved sparse Fourier transforms to higher dimensional settings are developed. As a consequence, approximate Fourier transforms are obtained which will identify a near-optima...

متن کامل

Sublinear-Time Approximation for Clustering Via Random Sampling

In this paper we present a novel analysis of a random sampling approach for three clustering problems in metric spaces: k-median, min-sum k-clustering, and balanced k-median. For all these problems we consider the following simple sampling scheme: select a small sample set of points uniformly at random from V and then run some approximation algorithm on this sample set to compute an approximati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2022

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v36i7.20779